126 research outputs found
Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks
We propose a system for calculating a "scaling constant" for layers and
weights of neural networks. We relate this scaling constant to two important
quantities that relate to the optimizability of neural networks, and argue that
a network that is "preconditioned" via scaling, in the sense that all weights
have the same scaling constant, will be easier to train. This scaling calculus
results in a number of consequences, among them the fact that the geometric
mean of the fan-in and fan-out, rather than the fan-in, fan-out, or arithmetic
mean, should be used for the initialization of the variance of weights in a
neural network. Our system allows for the off-line design & engineering of ReLU
neural networks, potentially replacing blind experimentation
Geometric clustering using the information bottleneck method
We argue that K–means and deterministic annealing algorithms for geometric clustering can be derived from the more general Information Bottleneck approach. If we cluster the identities of data points to preserve information about their location, the set of optimal solutions is massively degenerate. But if we treat the equations that define the optimal solution as an iterative algorithm, then a set of “smooth ” initial conditions selects solutions with the desired geometrical properties. In addition to conceptual unification, we argue that this approach can be more efficient and robust than classic algorithms.
- …